This work expands upon the Beta to Release matching proof-of-concept, by validating the technique across versions. The matching model was trained on v67 Desktop Firefox versions, matching Beta profiles that were representative of Release. Comparisons were made against these matched profiles and Release for v68 performance and engagement metrics. The results for v68 are similar to those observed in v67. This suggests this methodology can be used to calculate additional Firefox Release Health metrics derived from the Beta populations.
The following tables represent the relative difference between the Beta and Release train (v67) and validation (v68) datasets for the mean and median respectively.
| CONTENT_PAINT_TIME_CONTENT | TIME_TO_DOM_CONTENT_LOADED_END_MS | MEMORY_TOTAL | TIME_TO_DOM_COMPLETE_MS | FX_PAGE_LOAD_MS_2_PARENT | FX_TAB_SWITCH_TOTAL_E10S_MS | CONTENT_FRAME_TIME_GPU | COMPOSITE_TIME_GPU | TIME_TO_LOAD_EVENT_END_MS | startup_ms | |
|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching: v67 | 0.1281559 | 0.2356492 | 0.7393386 | 0.3833827 | 0.1917986 | 0.3048868 | 0.0979446 | 0.2107906 | 0.4134848 | 0.3682498 |
| post-matching: v67 | 0.0054015 | 0.0004005 | 0.6380770 | 0.0532963 | 0.0187386 | 0.0441628 | 0.0229548 | 0.0994379 | 0.0559891 | 0.2795177 |
| pre-matching: v68 | 0.1478562 | 0.3323291 | 0.6108449 | 0.4846121 | 0.2650356 | 0.3245591 | 0.0868224 | 0.0128613 | 0.5148570 | 3.5766275 |
| post-matching: v68 | 0.0214504 | 0.0393589 | 0.6208545 | 0.0709876 | 0.0227569 | 0.0232778 | 0.0126205 | 0.0753851 | 0.0748421 | 3.8479595 |
| CONTENT_PAINT_TIME_CONTENT | TIME_TO_DOM_CONTENT_LOADED_END_MS | MEMORY_TOTAL | TIME_TO_DOM_COMPLETE_MS | FX_PAGE_LOAD_MS_2_PARENT | FX_TAB_SWITCH_TOTAL_E10S_MS | CONTENT_FRAME_TIME_GPU | COMPOSITE_TIME_GPU | TIME_TO_LOAD_EVENT_END_MS | startup_ms | |
|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching: v67 | 0.0702651 | 0.1576861 | 0.5028295 | 0.2033473 | 0.1579308 | 0.1034576 | 0.0400662 | 0.1236804 | 0.2201331 | 0.3451365 |
| post-matching: v67 | 0.0083898 | 0.0332697 | 0.4675830 | 0.0213340 | 0.0126864 | 0.0722753 | 0.0112279 | 0.0850397 | 0.0188157 | 0.1348645 |
| pre-matching: v68 | 0.0777329 | 0.2705283 | 0.4233655 | 0.3147478 | 0.2374249 | 0.1487312 | 0.0317406 | 0.0303801 | 0.3314132 | 0.4706603 |
| post-matching: v68 | 0.0212776 | 0.0172085 | 0.4738329 | 0.0275340 | 0.0072225 | 0.0127165 | 0.0019684 | 0.0670938 | 0.0338250 | 0.2571110 |
There is significant utility in findiing representative Beta populations of Firefox that can give insight into Release before its launch. In a previous work, it was shown that statistical matching can find a subset of Beta that is representative of Release regarding performance metrics. However, a real world use-case is training the model on v67, finding matched clients, then applying to a subsequent version. This work attempts to further validate the technique, by following this real-world use-case.
The code that exported the data is available here. Similar filters are applied as the previous work.
The follows makes up the training dataset, used in statistical matching:
| channel | count |
|---|---|
| beta | 80052 |
| release | 19948 |
The followings filters constitute the validation dataset:
| channel | count |
|---|---|
| beta | 91502 |
| release | 391806 |
The highest performant model was trained on the v67 dataset. The code the performed the modeling is available here:
The final result of this model is a subset of Beta profiles most representative of Release.
The next step is to subset the validation v68 dataset by these matched Beta profiles. This reduces the Beta sample size used in the subsequent analysis:
The following plots show the covariate distributions for the following subsets:
NOTE: Guiding lines have been added for the following:
The same set of performance metrics as the previous analysis, were held out from matching model training and used as a model diagnostic:
The following covariates were used in training the v67 model. Note that the environment covariates were trained on the numerical versions, but have been converted to categories for plotting.
| active_hours | daily_max_tabs | daily_tabs_opened | search_count | daily_unique_domains | daily_num_sessions_started | num_bookmarks | num_addons | num_active_days | num_pages | uri_count | session_length | profile_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching: v67 | 0.3266487 | 0.3935228 | 0.2595199 | 0.2790857 | 0.0430361 | 0.0515823 | 0.4286822 | 0.2277464 | 0.3533521 | 0.0012293 | 0.3271315 | 0.2661923 | 0.0263616 |
| post-matching: v67 | 0.2682378 | 0.0292538 | 0.0389816 | 0.0874923 | 0.0245319 | 0.0341498 | 0.1172689 | 0.0675241 | 0.3027796 | 0.1151344 | 0.2571066 | 0.2795123 | 0.0021744 |
| pre-matching: v68 | 0.1585895 | 0.3423212 | 0.2416875 | 0.1076707 | 0.0004563 | 0.0461421 | 0.4279918 | 0.2137228 | 0.1752781 | 0.0587843 | 0.1581423 | 0.0956491 | 0.0421692 |
| post-matching: v68 | 0.0879519 | 0.0230131 | 0.0185330 | 0.1117880 | 0.0088909 | 0.0784622 | 0.1348414 | 0.0622400 | 0.0643182 | 0.0895890 | 0.0658028 | 0.0036550 | 0.0193793 |
| active_hours | daily_max_tabs | daily_tabs_opened | search_count | daily_unique_domains | daily_num_sessions_started | num_bookmarks | num_addons | num_active_days | num_pages | uri_count | session_length | profile_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pre-matching: v67 | 0.4538899 | 0.1428571 | 0.1000000 | 0.4 | 0.0000000 | 0.0476190 | 0.0643939 | 0.2 | 0.3333333 | 0.3316079 | 0.4914611 | 0.4298208 | 0.0015267 |
| post-matching: v67 | 0.3589987 | 0.0000000 | 0.0000000 | 0.2 | 0.0000000 | 0.0285714 | 0.0931818 | 0.0 | 0.3333333 | 0.4344796 | 0.3966165 | 0.3671831 | 0.0120301 |
| pre-matching: v68 | 0.3630908 | 0.1142857 | 0.0208333 | 0.4 | 0.0222222 | 0.0476190 | 0.1450000 | 0.2 | 0.3000000 | 0.4338290 | 0.4165103 | 0.3363515 | 0.0236220 |
| post-matching: v68 | 0.1583799 | 0.0069444 | 0.0700809 | 0.0 | 0.0155440 | 0.0699301 | 0.0545455 | 0.0 | 0.0909091 | 0.3409352 | 0.1929825 | 0.0746248 | 0.0274657 |
The matching yielded a subset that was similarly representative to v67 as to v68 for most of the covariates reviewed. However, for a subset of covariates, the difference between channels actually increased (e.g., profile_age, default_search_engine), or are distinctly different than Release before and after matching, namely MEMORY_TOTAL. This latter covariate requires further investigation as why its distribution is significantly more spread to higher values than for Release.
The usage of a performance metrics hold-out set is not necessary, when applying the model across versions. However, research has shown that optimal feature selection for statistical matching uses the effects (e.g., hold-out covariates) rather than the response (e.g., whether it is Beta or Release) typical of predictive modeling. Therefore, knowledge of the metrics of concern before matching occurs is key.
This methodology is an initial step towards providing an additional set of Firefox Release Heatlth metrics derived from the Beta release population. Additional work to realize this goal include: